A ranking-based scoring function for peptide-spectrum matches.
نویسنده
چکیده
The analysis of the large volume of tandem mass spectrometry (MS/MS) proteomics data that is generated these days relies on automated algorithms that identify peptides from their mass spectra. An essential component of these algorithms is the scoring function used to evaluate the quality of peptide-spectrum matches (PSMs). In this paper, we present new approach to scoring of PSMs. We argue that since this problem is at its core a ranking task (especially in the case of de novo sequencing), it can be solved effectively using machine learning ranking algorithms. We developed a new discriminative boosting-based approach to scoring. Our scoring models draw upon a large set of diverse feature functions that measure different qualities of PSMs. Our method improves the performance of our de novo sequencing algorithm beyond the current state-of-the-art, and also greatly enhances the performance of database search programs. Furthermore, by increasing the efficiency of tag filtration and improving the sensitivity of PSM scoring, we make it practical to perform large-scale MS/MS analysis, such as proteogenomic search of a six-frame translation of the human genome (in which we achieve a reduction of the running time by a factor of 15 and a 60% increase in the number of identified peptides, compared to the InsPecT database search tool). Our scoring function is incorporated into PepNovo+ which is available for download or can be run online at http://bix.ucsd.edu.
منابع مشابه
Learning score function parameters for improved spectrum identification in tandem mass spectrometry experiments.
The identification of proteins from spectra derived from a tandem mass spectrometry experiment involves several challenges: matching each observed spectrum to a peptide sequence, ranking the resulting collection of peptide-spectrum matches, assigning statistical confidence estimates to the matches, and identifying the proteins. The present work addresses algorithms to rank peptide-spectrum matc...
متن کاملStatistical calibration of the SEQUEST XCorr function.
Obtaining accurate peptide identifications from shotgun proteomics liquid chromatography tandem mass spectrometry (LC-MS/MS) experiments requires a score function that consistently ranks correct peptide-spectrum matches (PSMs) above incorrect matches. We have observed that, for the Sequest score function Xcorr, the inability to discriminate between correct and incorrect PSMs is due in part to s...
متن کاملThe Extraction of Influencing Indicators for Scoring of Insurance Companies Branches Based on GMDH Neural Network
O ne of the key topics and the most important tools to determine the strengths, weaknesses, opportunities and threats of each organization and company is the evaluation the performance of organizational activities that rating and ranking follows the internal and external goals. In this regard insurance companies similarly are looking for evaluation of their branches through scoring, ...
متن کاملAssigning spectrum-specific P-values to protein identifications by mass spectrometry
MOTIVATION Although many methods and statistical approaches have been developed for protein identification by mass spectrometry, the problem of accurate assessment of statistical significance of protein identifications remains an open question. The main issues are as follows: (i) statistical significance of inferring peptide from experimental mass spectra must be platform independent and spectr...
متن کاملBioinformatics Methods for Protein Identification Using Peptide Mass Fingerprinting
Protein identification by mass spectrometry (MS) is an important technique in proteomics. By searching an MS spectrum against a given protein database, the most matched proteins are sorted using a scoring function and the top one is often considered the correctly identified protein. Peptide mass fingerprinting (PMF) is one of the major methods for protein identification using MS technology. It ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of proteome research
دوره 8 5 شماره
صفحات -
تاریخ انتشار 2009